Convergence of Optimistic and Incremental Q-Learning

نویسندگان

  • Eyal Even-Dar
  • Yishay Mansour
چکیده

Yishay Mansourt Vie sho,v the convergence of tV/O deterministic variants of Qlearning. The first is the widely used optimistic Q-learning, which initializes the Q-values to large initial values and then follows a greedy policy with respect to the Q-values. We show that setting the initial value sufficiently large guarantees the converges to an Eoptimal policy. The second is a new and novel algorithm incremental Q-learning, which gradually promotes the values of actions that are not taken. We show that incremental Q-learning converges, in the limit, to the optimal policy. Our incremental Q-learning algorithm can be viewed as derandomization of the E-greedy Q-learning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

On the Convergence of Optimistic Policy Iteration

We consider a finite-state Markov decision problem and establish the convergence of a special case of optimistic policy iteration that involves Monte Carlo estimation of Q-values, in conjunction with greedy policy selection. We provide convergence results for a number of algorithmic variations, including one that involves temporal difference learning (bootstrapping) instead of Monte Carlo estim...

متن کامل

Towards Finite-Sample Convergence of Direct Reinforcement Learning

While direct, model-free reinforcement learning often performs better than model-based approaches in practice, only the latter have yet supported theoretical guarantees for finite-sample convergence. A major difficulty in analyzing the direct approach in an online setting is the absence of a definitive exploration strategy. We extend the notion of admissibility to direct reinforcement learning ...

متن کامل

Further study on $L$-fuzzy Q-convergence structures

In this paper, we discuss the equivalent conditions of pretopological and topological $L$-fuzzy Q-convergence structures and define $T_{0},~T_{1},~T_{2}$-separation axioms in $L$-fuzzy Q-convergence space. {Furthermore, $L$-ordered Q-convergence structure is introduced and its relation with $L$-fuzzy Q-convergence structure is studied in a categorical sense}.

متن کامل

Stratified $(L,M)$-fuzzy Q-convergence spaces

This paper presents the concepts of $(L,M)$-fuzzy Q-convergence spaces and stratified $(L,M)$-fuzzy Q-convergence spaces. It is shown that the category of stratified $(L,M)$-fuzzy Q-convergence spaces is a bireflective subcategory of the category of $(L,M)$-fuzzy Q-convergence spaces, and the former is a Cartesian-closed topological category. Also, it is proved that the category of stratified $...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001